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Abstract 

This paper presents an efficient parallel algorithm for a new class of min-max problems based on the 
matrix multiplicative weight (MMW) update method. Our algorithm can be used to find near-optimal 
strategies for competitive two-player classical or quantum games in which a referee exchanges any num- 
ber of messages with one player followed by any number of additional messages with the other. This 
algorithm considerably extends the class of games which admit parallel solutions and demonstrates for 
the first time the existence of a parallel algorithm for any game (classical or quantum) in which one 
player reacts adaptively to the other. 

A special case of our result is a parallel approximation scheme for a new class of semidefinite pro- 
grams whose feasible region consists of n-tuples of semidefinite matrices that satisfy a certain consis- 
tency condition. Applied to this special case, our algorithm yields a direct polynomial-space simulation 
of multi-message quantum interactive proofs resulting in a first-principles proof of QIP = PSPACE. 
It is noteworthy that our algorithm establishes a new way, called the min-max approach, to solve SDPs 
in contrast to the primal-dual approach to SDPs used in the original proof of QIP = PSPACE. It also 
follows from our work that several competing-provers complexity classes collapse to PSPACE such as 
QRG(2), SQG and two new classes called DIP and DQIP. 



1 Introduction 



1.1 Results 

Parallel approximation of semidefinite programs and min-max problems 

This paper presents an efficient parallel algorithm for a new class of min-max problems with applications 
to classical and quantum zero-sum games and interactive proofs. A special case of our result is a parallel 
approximation scheme for semidefinite programs (SDPs) of the form 

minimize Tv(X n P) (1) 
subject to Tr Cn (X n ) = $ n _ x (X„_i) 

Tr C2 (X 2 ) = cD 1 (X 1 ) 
Tr Cl (X l ) = Q 

X\ , . . . , x n y o. 

where Tr^, . . . , Trc n are partial trace maps and <3?i , . . . , <3? n _i are arbitrary completely positive and trace- 
preserving maps. 1 It has long since been known that the problem of approximating the optimal value of 
an arbitrary SDP is logspace-hard for P, 2 so there cannot be a parallel approximation scheme for all SDPs 
unless NC = P. However, the precise extent to which SDPs admit parallel solutions is not known. Our 
result adds considerably to the set of such SDPs. The result is stated in full generality as follows. 

Theorem 1 (Informal, see Section [6] for details). Let A denote the feasible region of the SDP dH). There 
exists an efficient parallel oracle-algorithm for finding approximate solutions to the min-max problem 



min max Ti(X n P) (2) 
(x 1 ,...,x n )eAPeP 

with an oracle for optimization over the set P. The SDP (Q]) is recovered from the above min-max problem 
(f2]) in the special case where P = {P} is a singleton set. 

We also describe parallel implementations of this oracle for certain sets P, yielding an unconditionally 
efficient parallel approximation algorithm for the min-max problem (O for those choices of P. 



Applications to zero-sum games 

This algorithm can be used to find near-optimal strategies for a new class of competitive two-player games 
that are moderated by a referee and obey the following protocol. 

(i) The referee exchanges several messages only with Alice. 

(ii) After processing this interaction with Alice, the referee exchanges several additional messages only 
with Bob. After further processing, the referee declares a winner. 

1 The partial trace and complete positivity are standard notions from quantum information. A linear map from square matrices 
to square matrices denotes a quantum channel if and only if it is completely positive and trace preserving. 

2 Hardness of approximation for SDPs follows from hardness of approximation for linear programming |Ser91 Meg92 |. 
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Indeed, our algorithm applies even to quantum games, in which the referee and players are free to exchange 
and process quantum information. Due to the similarity with the oft-studied interactive proof model of com- 
putation, games of this form shall be called double interactive proofs: the referee in such a game executes 
a standard interactive proof with Alice followed by a second interactive proof with Bob. (This protocol is 
depicted in Figure |2]on pageQj)] See Sections |4l and 19. 1 1 for further detail.) 

If the referee is specified succinctly by circuits rather than in explicit matrix form then our parallel 
algorithm can be used to find near-optimal strategies in polynomial space (via the relation NC(poly) = 
PSPACE [Bor77]). This algorithm is optimal in that it is PSPACE-hard even to distinguish games that 
Alice can win with near certainty from games that Bob can win with near certainty. This strong form of 
PSPACE-hardness holds even in the special case of two-turn games |FK97] where the referee exchanges 
only two messages synchronously with each player. 

Ordinary interactive proofs could also be cast as a special type of game in which the referee completely 
ignores Bob. Taking this view, the celebrated proof of IP = PSPACE MLFKN921 [Sha921l implies a similar 
hardness result: it is PSPACE-hard to distinguish interactive proofs that Alice can win with certainty from 
those which she can win with only exponentially small probability. 

Prior to the present work polynomial-space algorithms were known only for two-turn classical games 
and for quantum interactive proofs. The algorithm for two-turn games is due to Feige and Kilian [FK97]. 
Algorithms for quantum interactive proofs are presented in proofs of QIP = PSPACE [ JJ UW 1 Ol IWu lOal. 

Our result unifies and subsumes both of these algorithms. It also demonstrates for the first time the 
existence of a parallel algorithm for two-turn quantum games and for any game (classical or quantum) in 
which one player reacts adaptively to the other. 

Applications to complexity theory 

In complexity theory, our result implies the collapse to PSPACE of several classical and quantum interactive 
proof classes. Letting DIP and DQIP denote the competing-provers complexity classes associated with 
classical and quantum double interactive proofs, respectively, we have 

Corollary 1.1. DQIP = DIP = PSPACE. 

In contrast to the classical case, the competing-provers complexity class QRG(2) associated with two- 
turn quantum games was not known to be a subset of PSPACE prior to the present work. A special case of 
our result yields the equality 

QRG(2) = PSPACE, 

thus solving an open problem of Ref. [JJUW 10]. Of course, every other complexity class whose protocol 
can be cast as a double interactive proof also collapses to PSPACE, such as SQG [GW05 ]. 

In the special case of the SDP £T|) our algorithm yields a direct polynomial-space simulation of multi- 
message quantum interactive proofs, resulting in a first-principles proof of QIP = PSPACE. By contrast, 
all other known proofs IIJJUW101 IWulOal rely on the highly nontrivial fact that the verifier and prover in 
a quantum interactive proof can be assumed to exchange only three messages [KW00]. The original proof 
of Jain et al. also relies on the additional assumption that verifier's only message to the prover is a single 
classical coin flip [MW051. 

1.2 Techniques 

Our algorithm is an example of the matrix multiplicative weights update method (MMW) as discussed in the 
survey paper [AHK05] and in the PhD thesis of Kale [Kal07]. We also draw upon the valuable experience 
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of recent applications of this method to parallel algorithms for quantum complexity classes [JW09j IJUW091 
JJUW10, WulOa]. However, our application of the MMW method is somewhat different from all previous 
ones in the sense that our algorithm is applied twice in a two-level recursive fashion. At the top level, our 
algorithm makes use of the MMW method to solve a min-max problem. At the bottom level, a special 
case of our algorithm is used to solve a SDP problem as the implementation of the oracle for any min-max 
problem required by the MMW method. Previously the MMW was used only in primal-dual approaches 
to SDPs HAK071 iKaTOTl IJUW09I [JJUW 1 01 . By contrast, we do not take a primal-dual approach— our SDP 
solution arises as a special case of a more general min-max problem. A more detailed comparison can be 
found below. 

A naive approach to find optimal strategies for competitive two-player games is to choose a natural rep- 
resentation for these strategies and optimize over it. For two-turn classical games the natural representation 
is a table of probabilities — a stochastic matrix. Indeed, Feige and Kilian successfully optimize over this rep- 
resentation in their complicated and highly specialized precursor to the MMW that solves two-turn classical 
games in polynomial space [FK97 ]. 

For two-turn quantum games a strategy is naturally represented by a quantum channel. For more compli- 
cated games such as double quantum interactive proofs the most natural representation is a quantum strategy 
[GW07 ], which may be viewed as a special type of channel. A quantum channel is typically specified by its 
Choi-Jamiolkowski matrix [Wat08, Lecture 5]. But optimizing over Choi-Jamiolkowski matrices is a task 
fraught with difficulty [ JUW09 ] ; optimizing over Choi-Jamiolkowski matrices that also represent quantum 
strategies can only be harder. 

Fortunately, double quantum interactive proofs admit another representation for strategies that is more 
suitable for our purpose. In Kitaev's transcript representation [Kit02] the actions of a player are represented 
by a list p\ , . . . , p n of density matrices that satisfy a special consistency condition. Intuitively, these density 
matrices correspond to "snapshots" of the state of the referee's qubits at various times during the interaction. 
(See Figure |3]on page [12]) 

The key property of double quantum interactive proofs that we exploit is the ability to draw a "temporal 
line" in the interaction just after Alice's last action. Given a transcript pi, . . . , p n for Alice, the actions of 
Bob can then be represented by another transcript £i, . . . , £ m . By optimizing over all such transcripts one 
obtains an oracle for "best responses" for Bob to a given strategy of Alice as required by the MMW. 

Whereas the MMW in its unaltered form can be used to solve min-max problems over the domain of 
density operators, we introduce a new extension to this method for min-max problems over the domain of 
transcripts — a domain consisting of lists of multiple operators, each drawn from a strict subset of the density 
operators. The high-level approach of our method is as follows: 

1 . Extend the domain from a single density matrix to a list of n density matrices. 

This step is relatively straightforward: the MMW can be applied without complication to all n density 
matrices at the same time. 

2. Restrict the domain to a strict subset of density matrices. 

This step is more difficult. It is accomplished by relaxing the game so as to allow all density matrices, 
with an additional penalty term to remove incentive for the players to use inconsistent transcripts. 

3. Round strategies in the relaxed game to strategies in the original game. 

For this step one must prove a "rounding" theorem (Theorem [5]), which establishes that near-optimal, 
fully admissible strategies can be obtained from near-optimal strategies in the unrestricted domain 
with penalty term. 
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Primal-dual MMW versus min-max MMW 



It is interesting to compare the method used in the proof of QIP = PSPACE IIJJUW10II and the one used 
here, especially for their applications to SDP problems. Of course, both methods are based on the MMW 
method and share lots of similarities at the first look. However, significant differences exist for those two 
methods. The method used by Jain et al. is the so-called primal-dual approach for solving SDPs originally 
from [AK07]. This method makes use of the duality between the primal and dual problem of any SDP 
instance. Our method, on the contrary, makes no use of such duality. Instead our method, which works for 
min-max problems and thus is called the min-max approach, solves the SDPs as a special case when the max 
part is trivial. 

Both methods requires some efficient oracles for different subproblems and rounding theorems which 
convert approximately feasible solution to exact feasible solution without sacrificing the objective function 
too much. The sets of possible SDPs solvable by each method respectively are not known to coincide 
essentially because the existence of such efficient oracle and rounding theorem in one method doesn't imply 
their existence in the other method. Since the existence of such oracle and rounding theorem relies heavily on 
the specific form of SDPs in consideration, it is hard to argue which method is better than the other in general. 
Nevertheless, some advantages of the min-max approach are known MWulObll . For example, there exists a 
generic design of efficient oracle in the min-max approach while the existence of corresponding rounding 
theorem is not guaranteed and the approximately feasible solutions obtained in the min-max approach are 
close to feasible solutions in terms of C\ norm rather than norm in the primal-dual approach. 

We now consider the specific forms of SDPs in our comparison. If one rewrites the algorithm solving 
SDPs in [JJUW10] according to the standard way in Kale's thesis [Kal07], one can find out the existence of 
efficient oracle for the primal-dual approach depends on some additional assumptions from the complexity 
model which are no longer valid in our case. Moreover, the rounding theorem for the constraints in our 
problem, namely general partial trace constraints, requires the approximately feasible solution is close to 
the exact feasible solution in Ci norm. 3 Those difficulties make it hard to apply the primal-dual approach in 
our case. Instead, we design the min-max approach and add the penalty term to facilitate the use of MMW 
method. Furthermore, our results establishes a much larger class of SDPs that admits efficient parallel 
algorithms. 

The Bures metric 

Finally, it is noteworthy that the proof of our rounding theorem (Theorem [5]) contains an interesting and 
nontrivial application of the Bures metric, which is a distance measure for quantum states that is defined in 
terms of the more familiar fidelity function. 

Properties of the trace norm, which captures the physical distinguishability of quantum states, are often 
sufficient for most needs in quantum information. When some property of the fidelity is also required one 
uses the Fuchs-van de Graaf inequalities to convert between the trace norm and fidelity BFvdG 991. 

However, every such conversion incurs a quadratic slackening of relevant accuracy parameters. Our 
study calls for repeated conversions, which would incur an unacceptable exponential slackening if done 
naively via Fuchs-van de Graaf. Instead, we make only a single conversion between the trace norm and 

3 The SDP in | JJUW10] also has partial trace constraints. However, it is solved by the additional assumption that the measure- 
ment matrix is invertible and has bounded condition number. This assumption makes it possible that only scaled identity matrix 
appears in the analysis and the norm bound is sufficient. Such assumption is invalid in our case essentially because our algo- 
rithm recursively calls itself as the oracle. No assumption could be made about those inputs to the oracle since they are arbitrary 
instances generated during the MMW update. 
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the Bures metric and then repeatedly exploit the simultaneous properties of (i) the triangle inequality, (ii) 
contractivity under quantum channels, and (iii) preservation of subsystem fidelity 

Although conversion inequalities between the trace norm and Bures metric are implied by Fuchs-van de 
Graaf, to our knowledge explicit conversion inequalities have not yet appeared in published literature. The 
required inequalities are derived in the present paper (Proposition [3]). 

Organization of the paper 

The rest of the paper is organized as follows. We refer curious readers to Section |2] for further comments 
on related work. A brief preliminaries is provided in Section [3l followed by the formalization of the double 
quantum interactive proofs in Section 0] The rounding theorem, MMW based oracle-algorithm and the 
implementation of the oracle for certain choices of the set P are described in Section l5l6l7l respectivery. The 
containment of DQIP inside PSPACE is proved in Section[8] We conclude with some extensions of the main 
results in Section [9] 

2 Further comments on related work 

2.1 Parallel approximation of semidefinite programs 

We noted earlier that there is no parallel approximation scheme for arbitrary SDPs unless NC = P. But that 
fact does not rule out the existence of parallel algorithms for interesting subclasses of SDP 

Some of what is known about SDPs in this respect is inherited knowledge from linear programs (LPs). 
For example, Luby and Nisan describe their own precursor to the MMW that yields a parallel approximation 
scheme for so-called positive LPs where all input numbers are positive [LN93]. By contrast, Trevisan and 
Xhafa show that it is P-hard to find exact solutions for positive LPs [TX98]. 

The notion of a positive instance of an LP can be generalized to SDPs as follows. An SDP of the form 

minimize Tr(XP) 
subject to V(X) y Q 

x y o 

is said to be positive if P, Q y and ^ is a positive map (meaning that *&(X) y whenever X y 
0). Of course, P-hardness of exact solutions for positive LPs implies P-hardness of exact solutions for 
positive SDPs. By analogy with the Luby-Nisan algorithm for positive LPs, Jain and Watrous give a parallel 
approximation algorithm based on MMW for positive SDPs [JW09]. The algorithm is derived from a 
correspondence between positive SDPs and one-turn quantum refereed games and can therefore be recovered 
as a special case of the work of the present paper. 

Unlike the present paper, the original proof of QIP = PSPACE due to Jain et al. [JJUW10] does 
not take advantage of the transcript representation for multi-turn strategies. Instead, those authors derive a 
special SDP based on additional assumptions of the complexity class. It is not difficult to see that their SDP 
can be written in the form (Q]) considered in the present paper. Thus, the work of Jain et al. is also subsumed 
by our algorithm. It is noteworthy that neither the SDP instance from Ref. [JJUW10] nor its generalization 
(OQ) from the present paper are positive SDP instances. 
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2.2 Algorithms for competitive two-player games 

Competitive two-player games are often modeled as either a table of payouts {normal form) or a game tree 
{extensive form). The extensive form model is equivalent to the refereed games model wherein the game 
is specified by a referee who exchanges messages with the players and declares a winner at the end of the 
interaction. In this paper we prefer the refereed games model for its simplicity and the ease with which it 
extends to quantum games. 

The normal form is historically the most popular model, though it is not fully general like the exten- 
sive form or refereed game models. Indeed, normal form games correspond to the very restricted class 
of one-turn refereed games in which there is no communication from the referee to the players. Despite 
this restriction, the problem of computing the exact value of a normal form game is logspace-hard for P 
[FIKU08, FKS95]. This hardness result is striking when juxtaposed with the existence of deterministic 
polynomial-time algorithms for arbitrary, multi-turn games [KMvS94 ( KM92]. 

For succinct games in which the table of payouts, game tree, or referee is specified implicitly by circuits 
the aforementioned results immediately imply EXP-completeness for the problem of computing the exact 
value of a game. By contrast, the relaxed problem of approximating the value of such a game is much more 
diverse. For arbitrary multi-turn games EXP-hardness extends to the relaxed problem of distinguishing 
games that Alice can win with near certainty from games that Bob can win with near certainty [FK97 ]. 

But the situation is much different for shorter games. Earlier we mentioned that Feige and Kilian 
gave both (i) a polynomial-space approximation scheme for succinct two-turn games, and (ii) a match- 
ing PSPACE-hardness result valid even for weak approximations. Fortnow et al. prove that the problem of 
approximating the value of a succinct game in normal form {i.e. a one-turn classical game) is complete for 
S2 and they give a ZPP NP approximation scheme based on the multiplicative weights update method for 
the related search problem of finding near-optimal strategies for these games [FIKUQU. 

All that was known of quantum games prior to the present work is that arbitrary, multi-turn quantum 
games admit a polynomial-time exact solution [GW07 ] and that one-turn quantum games admit an efficient 
parallel approximation scheme [JW09]. For both classical and quantum games it is an interesting open 
question as to whether there is a parallel algorithm for approximating /c-turn games for some k > 2. 

2.3 Interactive proofs with competing provers 

An interactive proof with competing provers consists of a conversation between a randomized polynomial- 
time verifier and two computationally unbounded provers on some input x. One of the provers — the yes- 
prover — tries to convince the verifier to accept x, while the other — the no-prover — tries to convince the 
verifier to reject x. The analogy to competitive games is obvious, and for this reason such interactive proofs 
are also called refereed games. 

A decision problem L is said to admit a classical refereed game if there exists a randomized polynomial- 
time referee such that: (i) if x is a yes-instance of L then the yes-prover can convince the verifier to accept 
with probability at least 2/3 regardless of the no-prover's strategy, and (ii) if x is a no-instance of L then 
the no-prover can convince the verifier to reject with probability at least 2/3 regardless of the yes-prover's 
strategy. The complexity class of problems that admit classical refereed games is denoted RG. Polynomial- 
time algorithms for game trees imply RG C EXP [KM92, KMvS94|. The reverse containment follows 
from hardness of approximation for refereed games [FK97 ], yielding the characterization RG = EXP. 

Quantum refereed games are defined similarly except that the referee is a polynomial-time quantum 
computer who exchanges quantum information with the provers. The class of problems that admit quantum 
refereed games is denoted QRG. The polynomial-time algorithm for quantum games implies QRG C EXP 
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[GW07]. Prior work on classical refereed games then implies 



QRG = RG = EXP, 

which is the competing-prover analogy of the well-known collapse QIP = IP = PSPACE for single-prover 
interactive proofs HLFKN92I ISha92l IJJUWIOl IWulOal . 

For each positive integer k the complexity classes of problems that admit £;-turn classical and quantum 
refereed games are denoted RG(fc) and QRG(fc), respectively. The results of Fortnow et al. tell us that 
RG(1) is essentially a randomized version of Sf- The parallel algorithm for one-turn quantum games 
immediately implies QRG(l) C PSPACE [JW09]. For two-turn games, Feige and Kilian proved RG(2) = 
PSPACE IFK971 4 and the complexity of QRG (2) is an open question of Ref. HJJUW10II that is solved in 
the present paper. The exact complexity of RG(/c) and QRG(/c) for all other k is not known. 

Double quantum interactive proofs with exactly two messages per player have been called short quantum 
games. The associated complexity class is denoted SQG; it trivially contains QRG(2) and is known to 
contain QIP [GW05]. The importance of this class was diminished by the proof of QIP = PSPACE. The 
present paper establishes that SQG is also equal to PSPACE. 

Earlier we defined DIP and DQIP to be the complexity classes of decision problems that admit classical 
and quantum double interactive proofs, respectively. These classes appear quite large at first glance. For 
example, it follows immediately from first principles that DQIP contains SQG (and hence QRG(2)) as well 
as both QIP and its complement co-QIP. That this class should collapse to PSPACE could be construed 
as surprising. 

Our results also illustrate a difference in the role of public randomness between single-prover interactive 
proofs and competing-prover interactive proofs. Any classical interactive proof with single prover can be 
simulated by another public coin interactive proof where the verifier's messages to the prover consist entirely 
of uniformly random bits and the verifier uses no other randomness [GS89]. (Public coin single-prover 
interactive proofs are also known as Arthur-Merlin games.) Extending the notion of public coin interaction 
to refereed games, it is easy to see that an arbitrary public-coin refereed game with any number of turns 
can be simulated by a double interactive proof. 5 We therefore have that the public-coin version of RG is 
a subset of DIP, which we now know is equal to PSPACE. Thus, by contrast to the single-prover case 
where we have public-coin-IP = IP, in the competing-prover case we have public-coin-RG 7^ RG unless 
PSPACE = EXP. 

3 Preliminaries 

We assume familiarity with standard concepts from quantum information UNCOCK IWat08ll . This section 
provides a table describing our notation in Figure [Q followed by a brief survey of parallel computation in 
Section |3~T1 (also known as NC computation). Two rarer but nonetheless simple and fundamental concepts 
from quantum information are also discussed: the preservation of subsystem fidelity in Section [3^21 and the 
Bures angle in Section [331 

4 The class we call RG(2) is called RG(1) by Feige and Kilian. This conflict in notation stems from the fact that we measure 
the length of a game in turns, whereas those authors measure a game in rounds of messages. This switch of notation was instigated 
by Jain and Watrous |JW09|, who required a convenient symbol for one-turn refereed games. 

5 Proof sketch: As the referee's questions to a player are uniformly random, they cannot depend on prior responses from the 
other player and can therefore be reordered so that all messages with one player are exchanged before any messages are exchanged 
with the other. 
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x,y,xy 


Finite-dimensional complex vector spaces. Xy is shorthand for X £* 




L(X) 


The (complex) space of all linear operators A : X — > X . 




Ix 


The identity operator acting on X. 




Dens(^) 


The compact convex set of all density operators within L(X). 




Meas(^) 


The compact convex set of all measurement operators within L(X). 
ator is a positive semidefinite operator M with M ~<Ix- 


A measurement oper- 


U(X) 


The set of all unitary operators within L(X). 




A* 


The adjoint of an operator A : X — > y, which has the form A* : y - 


-> X. 


(A,B) 


The standard inner product between A, B : X y. Defined by {A, B) = Tr(A*B). 



Figure 1 : Notation and terminology 



3.1 NC and parallel matrix computations 

We denote by NC the class of promise-problems that admit efficient parallel algorithms. Since every matrix 
is of exponential size in term of the input size in quantum computation, we also need the scaled up version 
of NC, namely NC(poly). It is through the relation NC(poly) = PSPACE that we can prove polynomial- 
space upper bound. 

There are many nice facts about these classes that we will make use of in our discussions. The first is that 
the functions in these classes compose nicely. Thus one can design efficient parallel algorithms for each part 
of the whole problem and then compose them to get an efficient parallel algorithm for the whole problem as 
long as the number of parts is bounded. Another useful fact is that many computations involving matrices, 
such as singular value decompositions and matrix exponentials, can be performed by NC algorithms (see 
the survey [vzG93]). Moreover, the adapted NC algorithms especially for the implementation of matrix 
multiplicative weight update method are also known before. (See Refs. IUUW091 IJJUWIOl or IGWIO, 
Facts 2-5]). It remains to show some special operations (e.g., computing a purification of a mixed state 
and computing a unitary that maps one purification to another) in our algorithm can also be implemented 
efficiently in parallel. All NC algorithms for these extra operations can be found in the previous version of 
this paper IIGW101 Lemmas 2-4]. 

The last concern about the implementation of those parallel algorithms is the precision issue. This 
issue raises when precision of some of the computations must be truncated because of the irrational number 
involved. Similar issue might also happen when one compose approximate computations. Fortunately, all 
the computations involved in our algorithm can be made either exact or approximate to high precision in 
NC. Furthermore, we will assume all these computations can be made exact in our proof and refer curious 
readers to the details of handling these issues in the previous version of this paper MGW101 Appendix B]. 

3.2 Preservation of subsystem fidelity 

Consider the following property of the fidelity function, which we call the preservation of subsystem fidelity: 
if a, a' are states of a quantum system with fidelity F(a, a') and p is any state of a larger system consistent 
with a then it is always possible to find p' consistent with a' such that F(p, p') = F(a, a'). 

A formal construction of such a p' appears in Jain et al. IIJUW091 . Since their construction consists 
entirely of elementary matrix operations mentioned above, such construction hence admits a parallel algo- 
rithm that takes as input a,a',p and produces p' in time bounded by a poly logarithm in the dimensions of 
the input matrices a, a', p. 
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Proposition 2 (Preservation of subsystem fidelity — see Ref. [JUW09, Lemma 7.21 JGW101 Lemma 2]). 
Let a, a' G Dens(V) and p G Dens {A V) be density operators with Tr_4(p) = a. There exists a density 
operator p' G Dens(„4V) with Tr_4(//) = a' and F(p,p') = F(a,a'). Moreover p' can be computed 
efficiently in parallel given a, a' , p. 

3.3 The Bures angle 

The Bures angle or simply the angle A(p, £) between quantum states p, £ is defined by 

A (P,0 = arccosF(p,£)- 

The angle is a metric on quantum states, meaning that it is nonnegative, equals zero only when p = £, and 
obeys the triangle inequality BNCOOII . Moreover, the angle is contractive, so that 

A(*(p),*(0)<A(p,0 

for any quantum channel <l>. The Fuchs-van de Graaf Inequalities establish a relationship between the fidelity 
and trace norm [FvdG99]. The inequalities are 

1, 



1-F(p,0 < -||p-£||t* < y/\-F{p,ay. 

These inequalities can be used to derive a relationship between A(p, £) and || p — £ For example, 
Proposition 3 (Relationship between trace norm and Bures angle). For all density matrices p, £ it holds that 



1 TT 

Proof. The upper bound follows immediately from Fuchs-van de Graaf: 
f 



2 IIP " £lta < Vl " cos A(p,0 2 = sin < 

where we used the identity sin x < x for all x > 0. 

To obtain the lower bound we employ the identity cosx < 1 — x 2 /tt for x G [0, 7r/2], which can be 
verified using basic calculus. Then we have 

J||p-elk>l-cosA(p,0>^^ 
from which the proposition follows. □ 

4 Double quantum interactive proofs 

A double quantum interactive proof is completely specified by a referee, which consists of a tuple R = 
Vi,... ,14+6,11) where 

|^} G CV is a pure state 

Vi, . . . , G U(CV) are unitary operators 

II G Meas(CV) is a projective measurement operator 
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Figure 2: An illustration of a double quantum interactive proof in which the referee R = 
V±, . . . , Vq,H) exchanges a = 3 rounds of messages with Alice followed by b = 3 rounds of mes- 
sages with Bob before performing the measurement {IT, 7 — 11} and announcing a winner. The register 
C is a message register to be exchanged among the referee and players. The registers V, A, and B are 
private memory registers for the referee, Alice and Bob, respectively. Any choice of (Ai,A%, A%) and 
(Bi, B2, B3) induces a state p and a measurement operator P as indicated. Bob's winning probability is 
given by (p,P) = Tr(pP). 

The spaces C, V correspond to registers C, V. The register C is a message register to be exchanged with the 
players and the register V is a private memory register for the referee. 

The actions of the players during each round of interaction are specified by unitary operators acting 
upon the message register C and a private memory register for that player. In particular, Alice's actions are 
specified by unitary operators A%,...,A a G XJ(CA) where the space A corresponds to the private memory 
register A for Alice. Similarly, Bob's actions are specified by unitary operators Bi, . . . , B b G U(CB) where 
the space B corresponds to the private memory register B for Bob. 

The game proceeds as suggested by Figure |2]and is described as follows: 

1. The referee prepares the registers (C, V) in the pure state \tp). The players' private registers A, B are 
both initialized to the pure state |0). 

2. For i = 1, . . . , a: The register C is sent to Alice, who applies Ai to the registers (C, A). The register 
C is then returned to the referee, who applies V{ to the registers (C, V). 

3. For i = 1, . . . , b: The register C is sent to Bob, who applies Bi to the registers (C, B). The register C 
is then returned to the referee, who applies V a+ i to the registers (C, V). 

4. The referee applies the binary-valued measurement {n, I — 11} on the registers (C, V) with the out- 
come associated with II indicating victory for Bob. 

Basic quantum formalism tells us that if Alice and Bob act according to (Ai, . . . , A a ) and (Bi, . . . , B^), 
respectively, then the probability with which Bob is declared the winner is given by 

Pr[Bob wins | (A u . . . , A a ), (B 1} . . . , B b )] 
= \\TlV a+b B b V a+b _ 1 B b _ 1 ---B 1 V a A a V a - 1 A a ^ 1 ---A 2 V 1 A 1 \ij)\\ 2 . (3) 

(For clarity we have suppressed numerous tensors with identity and the initial states |0) of the players' 
private memory registers.) 
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Of course, Bob wishes to maximize this quantity while Alice wishes to minimize this quantity. It follows 
immediately from the min-max theorem for zero-sum quantum games [GW07] that every double quantum 
interactive proof with referee R has a value X (R) given by 

X(R) = min max Pr[Bob wins | (Ai,..., A a ), (B 1 ,..., B h )] (4) 
(A u ...,A a ) (B 1: ...,B b ) 

= max min Pr [Bob wins I (Ai, ... ,A a ), (Bi, Bi,)] 
(B u ...,B b ) (A u ...,A a ) 

where the minima are over all private spaces A for Alice and all unitaries A\, . . . ,A a G XJ(CA) and the 
maxima are over all private sapces B for Bob and all unitaries B\, . . . ,B b € XJ(CB). In particular, for 
every double quantum interactive proof with referee R there exist optimal actions (A*, . . . , A*) for Alice 
and (B\, . . . , B£) for Bob such that 

Pr [Bob wins | (A*, A*), (B\, Bf,)] < X(R) for all (B u . . . , B b ), 
Pr[Bobwins | (A lt . . . , A a ), (Sf, . . . , £ 6 *)] > X(R) for all (A u . . . , A a ). 

From an operational perspective, the min-max expression (0]) for X(R) in terms of unitaries (A\ , . . . , A a ) 
and (Bi, . . . , Bf,) is natural and intuitive. However, this expression does not lend itself well to the MMW, 
which is designed to solve min-max problems over domains of density operators — not tuples of unitaries. 
To address this problem we derive an alternate expression for A (it!) that is more amenable to the MMW. 

To this end, for any (A\, . . . , A a ) and (B\, . . . , Bf,) let p be the reduced state of the registers (C, V) 
immediately after Alice's final unitary is applied and let P be the measurement operator on (C, V) obtained 
by bundling the referee-Bob interaction into a single measurement operator as suggested by Figure [2] The 
expression © for Bob's probability of victory can be rewritten in terms of p, P as 

Pr[Bob wins | (A u . . . , A a ), (B u . . . , B b )} = (p,P). 

Similarly, the expression © for X(R) can be rewritten as 

X(R) = min max (p, P) = max min (p, P) 
P eA(R) PeP(i?) PeP(R) peA(i?) 

where the sets A(R) C Dens(CV) and P(R) C Meas(CV) are given by 

A(R) = {Tr^(|0}(0|) : \<f>) = A a V a . x A a . x ■ ■ ■ A 2 V 1 A X \^) for some {A x ,..., A a )} , (5) 
P(R) = {U*UU : U = K a+6J B 6 y o+6 _ 1 B 6 _ 1 • • • B{V a for some (Si, ... , B b )} . (6) 

At this point, we have rewritten X(R) so that the set of all possible actions available to Alice has been 
identified with a subset A(R) of density operators, as desired. (Bob's actions will be addressed later.) 
However, the MMW is designed to solve min-max problems whose domain is the entire set of density 
operators. In the next section we present a new adaptation of the MMW that applies to min-max problems 
on strict subsets of density operators. We will see that this adaptation yields a parallel algorithm for the 
above formulation of X(R). 

5 Rounding theorem for a relaxed min-max problem 

In this section we define a new min-max expression p, E (R) that approximates the desired quantity X(R) in 
the limit as e approaches zero. The new expression is a relaxation of X(R) that is more amenable to the 
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Figure 3: The states pi, P2, P3 are a transcript of the referee's conversation with Alice. It follows easily from 
the unitary equivalence of purifications that a triple (p\,p2, Pz) is a valid transcript if and only if it obeys the 
recursive relation Trc(ft+i) = Tic(ViPiV*) for i = 0, 1, 2 where Vq = I. 



MMW. We prove a "rounding theorem" by which near-optimal points for A(i?) are efficiently obtained from 
near-optimal points for p e {R). 

We begin in Section I5TT1 with a review of the consistency conditions for transcripts, which motivate our 
definition of p £ (R). A formal definition of p £ (R) and proof of the rounding theorem appear in Section [5^21 
Section 1531 contains the proof of a technical lemma and its corollary that is used in the rounding theorem. 
Our use of the Bures metric occurs in the proof of this lemma. 



5.1 Consistency conditions for Alice 

The set A(R) of density operators that represent admissible actions for Alice as defined in (0) is unwieldy. 
In order to optimize over this set we begin by writing it not in terms of unitaries A\ , . . . , A a but in terms 
of states pi, . . . , p a that represent a transcript of the referee's conversation with Alice. Such a transcript is 
depicted in Figure [3] It is straightforward to use the unitary equivalence of purifications to characterize those 
density matrices which constitute valid transcripts. This characterization was first noted by Kitaev [Kit02] 
and a formal proof can be found in Ref. [Gut05 ]. 

Proposition 4 (Kitaev's consistency conditions — see Ref. lfGut051). Let R = (\ip), V\, . . . , V a+0 , II) be a 
referee and let A(R) be the set of admissible states for Alice as defined in Eq. (J5]). A given state p is an 
element of A{R) if and only if there exist p\, . . . , p a G Dens(CV) with p a = p and 

Tr c ( Pi+1 ) = Tv c (V iPi V*) fori = 0,...,a-l 

where we have written Vq = I and po = (ip\for convenience. 

Any states p\,...,p a obeying the consistency condition of Proposition @] are said to be consistent with 
R. It therefore follows from Proposition H] that the value X(R) of the game may be written 

X(R) = min max (p a ,P). (7) 

consistent with R 
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5.2 A relaxed min-max problem and a rounding theorem 

Define the relaxation p, £ (R) of X(R) by 

a-1 

Hs(R)= min max (p a , P) + - V" (Trc(p»+i) - Tvc(VipiV*), Ik+l) 
(pi,...,p„) PeP(fl) e f-' 

(rii,...,n a ) 1=0 



a— 1 j 

min max (p a , P) + - V) - \\Ti c (pi+i) - Tx c (ViPiV*) 



(pi,...,p a )P G P(P) 

Here the minimum is taken over all density operators pi, . . . , p a G Dens(CV) and the maximum over 
all P G P(-R) and over all measurement operators III, . . . , II a G Meas(V). The second equality follows 
immediately from the identity — ^||xr = maxo^n^/(p — ^, II) which holds for all density operators 

Notice that the minimum in the definition of p £ (R) is taken over all density operators, not just those 
consistent with R. Each term in the summation serves to penalize any violation of consistency in the choice 
of pi,...,p a by adding the magnitude of that violation to Bob's probability of victory. The a/e factor 
amplifies the penalty so as to remove incentive for Alice to select an inconsistent course of action. Indeed, 
it is clear that 

lim pJR) = X(R). 

The following "rounding" theorem establishes a specific rate of convergence for this limit and a means by 
which near-optimal points for X(R) are efficiently computed from near-optimal points for p £ {R). 

Before proving Theorem [5] we need some terminology. Consider any equilibrium value A of the form 

A = min max f (a, b) = max min f(a, b) 
aeA beB y J beB aeA 

A pair (a, b) is called 5-optimal for X if 

max /(a, b) < X + 5 and min/(a, b) > X — 5. 

beB aeA 

Elements that are 0-optimal are simply called optimal. Any value A is called 5-optimal for A if |A — A| < 5. 
Theorem 5 (Rounding theorem). The following hold for any referee R and any e, 5 > 0: 

1. X{R) > p £ (R) > X(R)-e. 

2. If(P, III, ... , IIa) is 5-optimal for p, e (R) then P G P(-R) is also (5 + e)-optimal for X(R). 

3. If (pi, ■ ■ ■ , p a ) is 5-optimal for p e (R) then there exists density operators (p[, . . . , p' a ) consistent with 
R that can be computed in parallel time O (apolylog(dim(CV))) such that p' a G A(i?) is (5 + e)- 
optimal for X(R). 

Proof. We begin with item[TJ The first inequality is easy: let (p\, . . . , p„) achieve the minimum for X(R) 
in Eq. ((T]) among all density operators consistent with R. Let (P^, il^, . . . , IIa) achieve the maximum for 
p £ (R). Then we have 



a-1 

£ 



X(R) > (p x a ,P») = (p x a ,P») + "Y, (^(4) -TrMrivn^+i) > »e(R). 



i=0 
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The second inequality is more difficult. Choose any density operators pi, . . . ,p a . By Lemma |7] (the 
statement of which appears below in Section 1531 ) there exist density operators p[, . . . , p' a consistent with R 
such that 

1 a al 1 

^Wpa- PaH < e + - ^-||Tr c (pi+i) - Tr c (ViPiV*)\\ T r- 

i=0 

For any measurement operator P we have 

(p a ,P) = (p' a ,P) + ( Pa -p' a ,P) > (p(, J p)_i||p a _ p ;|| rft 

a— 1 - 

> (p' a , p) - e - £ £ - H Tr ^+i) " ^dViPiV*) || Tr . (8) 

i=0 

The inequality ([8]> will be employed several times throughout the rest of this proof. 

To complete the proof of item Q] let p^, . . . , p^ be optimal for p e (P) and P A € P(-R) be optimal for 
X(R). Employing ((8]> for the choices (pi, . . . , p a ) = (p^, . . . , p£) and P = P A we obtain 

a — 1 1 

which establishes the desired lower bound on p £ (R). 

Item|3]follows easily from the above construction. For any P € P(P) we may substitute (pi, . . . , p a ) = 
(pi, . . . , p a ) into ([8]> and use the fact that X(R) > p £ {R) to obtain 

a— 1 - 

A(P) + 5 > /i £ (P) + 5 > (p a , P) + ZJ2- ||Tr c (p i+1 ) - Tr c (V-p^*) || Tr > P> - £ , 

i=0 

from which it follows that p' a is (5 + e)-optimal for X(R). 

Item |2] can be proven using the fact that X(R) — e < /U e (P) without making further use of the above 
construction. For any pi,...,p a consistent with R we have 

X(R)-e-5<p £ (R)-5< (p a ,P), 

from which it follows that P is also (5 + e)-optimal for A(P). □ 

5.3 Rounding lemma for obtaining consistent states 

In this subsection we prove a technical lemma and its corollary that appeared in the proof of Theorem |5J 
Given any states pi, . . . , p a , this lemma asserts that these states can be "rounded" to valid transcript states 
p' x , . . . , p' a in such a way that the distance between the final states p a and p' a is bounded by a function of the 
extent to which pi , . . . , p a violate the consistency condition of Proposition 0] The proof of this lemma is 
interesting because it provides a nontrivial application of the Bures angle. 

Lemma 6. For any referee R = (\ip), V\, . . . , V a +b> n) and any pi,...,p a € Dens(CV) there exist 
P\i ■ ■ ■ > Pa ^ Dens(CV) consistent with R such that 

a-1 

A( Pa ,p' a ) <J2 A ( Tr c(pi + i),Tr c (V iPi V*)). 

i=0 

Moreover, p±, . . . , p' a can be computed in parallel time O (a polylog(dim(CV))). 
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Proof. Define p[, . . . , p' a recursively as follows. Let p' Q = p$. For each i = 0, . . . , a — 1 by the preservation 
of subsystem fidelity (Proposition |2]> there exists p' i+1 (which can be efficiently computed) with Tvc(p' i+1 ) = 
TrciVipW) and " 

A {Pi+UPi+i) 

= A (Tr c (pi+i) , TrdVip'iV*)) (preservation of fidelity) 

< A (Trc(pi +1 ),Trc(V iPi V*)) + A (Tr c (y iPi V*) , Tr c (Kp^*)) (triangle inequality) 

< A (Tr c (p m ), TiciViPiV?)) + A{p i , P ' i ) (contractivity) 

The lemma now follows inductively from the fact that A(po, p' ) = 0. □ 
Lemma 7. For any e > the bound in Lemma®can be written in terms of the trace norm as 

1 a al 1 

2 II/ 9 " -PallTr <e + -^2-\\Trc{pi + i)-Trc(Vip i V^)\\^ 

Proof. It follows immediately from Lemma[6]and Proposition[3](Relationship between trace norm and Bures 
angle) that 

1 a_1 fn 

2 II Pa ~ P'alWr < £ J - 1 1 Tr C ) - Tr C (^p^*)|b- 

i=0 v 

The lemma then follows from the fact that sj^x < + 5 for all x > and all 5 > 0. □ 



6 The MMW oracle-algorithm for double quantum interactive proofs 

In this section we describe an efficient parallel oracle-algorithm that approximates \(R) to arbitrary preci- 
sion. (It is a simple matter to modify this algorithm so as to also produce unitaries A\ , . . . , A a for Alice and 
Bi, . . . ,Bb for Bob that are arbitrarily close to optimal. See Section |9Tl ) 

We begin in Section [67T1 with formal statements of the problem solved by our algorithm and the oracle 
it requires, as well as a brief review of the relevant facts concerning the MMW. Our algorithm and its 
analysis are provided in Section [6^21 In Section [631 we note that our algorithm can be used to approximate 
the solution of a semidefinite program on consistent density matrices efficiently in parallel, from which we 
recover the SDP CD) and the direct proof of QIP = PSPACE mentioned at the beginning of this paper. 

6.1 Preliminaries: formal statement of the problem, review of the MMW 

Precise statements of the problem solved by our algorithm and the oracle it requires are given below. For 
matrix inputs, each entry is written explicitly. The real and complex parts of all numbers are written as 
rational numbers in binary. 

Problem 1 (Approximation of X(R)). 
Input: A referee R = (\tp), V\, . . . , V a+ b,H) and an accuracy parameter S > 0. 
Oracle: Weak optimization for P(R). (See Problem [2] below.) 
Output: A number A with |A — X(R) \ < S. 
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Problem 2 (Weak optimization for P(i?)). 

Input: A referee R = (|t/>), Vi, . . . , V a+ b, II), a density operator p G Dens(CV), and an accuracy 
parameter 5 > 0. 

Output: A measurement operator P G P(-R) such that (p, P) > (p, P) — 5 for every P G P(-R)- 

The precise formulation of the MMW used in this paper is stated below as Theorem [8] Our statement of 
this theorem is somewhat nonstandard: the result is usually presented in the form of an algorithm, whereas 
our presentation is purely mathematical. However, a cursory examination of the literature — say, Kale's 
thesis MKal071 Chapter 3] — reveals that our mathematical formulation is equivalent to the more conventional 
algorithmic form. 

Theorem 8 (Multiplicative weights update method — see Ref. Kal071 Theorem 10]). Fix 7 G (0, 1/2). Let 
M {1 \. . . , be arbitrary D x D "loss" matrices with < M® r< al. Let . . . , be D x D 
"weight" matrices given by 

W {1) = I iy(*+!) = exp (-7 (m^ + ■■■ + M®) ) . 

Let pW , . . . , p( T ' be density operators obtained by normalizing each W^> , . . . , so that pW = W w / Tr (W^ 1 ). 
For all density operators p it holds that 

Note that Theorem [8] holds for all choices of loss matrices . . . , M^ T \ including those for which 

each Afw is chosen adversarially based upon . . . , W®. This adaptive selection of loss matrices is 

typical in implementations of the MMW. 

6.2 Statement and analysis of the MMW oracle-algorithm 

Let e > and consider the linear mapping 

f R<£ : (pi, . . . ,p a ) i-> (p a ,~ [Tr c (pa) - TrciVa-lpa-lV*^)] , 

- £ [Tr c (p 2 ) - Trc(y lPl Vf)] , 
^[TY c (p 1 )-Tr(p 1 )Tr c (|^^|)]) 

It is clear that 

p £ {R)= min max (f Ri£ (p 1 ,...,p a ),(P,U a ,...,U 1 )). 

( P l,...,Pa) PgP(R) 

(ni,...,n a ) 

It is tedious but straightforward to compute the adjoint map f R : 

f* R , E : (p, Ha, ... , no -> (p + ^n a ® icq [n a _i ® i c - ^ a _i(n a ® / c )K-i] , 

. . . , 

^[n 2 ®/ c -F 2 *(ii3<g>i c )F 2 ], 

^ [n x ® j c - y;(n 2 ® i c )v x - (^n^Kc-v] ) 
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The statement of our MMW algorithm in Figure [4] employs this formula for the adjoint. 

Proposition 9. The oracle-algorithm presented in Figure^approximates \{R) to precision 5 (Problem^. 
Assuming unit cost for the oracle, this algorithm can be implemented in parallel with run time bounded by 
a polynomial in a + b, 1/5, and log(dim(CV)). 

Proof. First, we note the fact that each loss matrix m\ satisfies < < \l follows immediately from 
its definition in step [2d] and the observation that the adjoint mapping satisfies 

(o, - f i, .... - f i, -£i) < fUP, n tt , . . . , no ± ((i + 1) i, a -i, . . . , f i) . 

For each i = 1 , . . . , a it is clear that the construction of the density operators pf^ in terms of the loss 
matrices presented in Figure [4] obeys the condition of Theorem [8] It therefore follows that for any 
density operator pi 6 Dens(CV) we have 




Summing these inequalities over all i we find that for any density operators (pi, ... , p a ) it holds that 




Substituting the definition of the loss matrices M- from step[2d]and simplifying, we obtain 




error term 



Substituting the choice of 7, T from step[T]we see that the error term on the right side is at most 5/2. Since 
this inequality holds for any choice of (p%, . . . , p a ) it certainly holds for the optimal choice, from which it 
follows that the right side is at most p £ (R) + 5/2. By construction each (P®, Tla \ ■ ■ ■ , TLi ) is a 5/2-best 
response to (pf \ . . . , p$) so it must be that the left side of this inequality is at most p e (R) — 5/2. It then 
follows from item Q] of Theorem [5] (Rounding theorem) and the choice e = 5/2 that |A — A(i2)| < 5 as 
desired. 

Next we argue that the density operator p' a returned in step 0] is 35/2-optimal for X(R). By item [3] 
of Theorem [5] it suffices to argue that (p\, . . . ,p a ) are J-optimal for p £ (R). To this end, choose any 
(P, ni, . . . , n Q ). Since each (pW, U { a\ . . . , nf } ) is a 5/2-best response to (pf\. . . , p^) it holds that 
the inner product 

((p?,...,P^),r R , £ (p^,^\...,u?)) 
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1. Lete = (5/2, let 7 



eS 
TEa 2 



, and let T 



ln(dim(CV)) 
I 2 



Let = lev for each % = 1, 



, a. 



2. Repeat for each t = 1, . . . , T: 



(a) For i = 1, . . . , a: Compute the updated density operators pf^ = / Tr(W^). 

(b) For i = 0, . . . , a — 1: Compute the projection II^i onto the positive eigenspace of 

Trdpflj-TrciVipfv*). 

(c) Use the oracle to obtain a J/2-optimal solution to the Weak optimization problem for P(-R) 
(Problem |2]) on input p$ . 

(d) Compute the loss matrices 



M®,...,M[ 



it) 



e 
la 1 



f* R>E (p« n« ...,n«)+^(/ cv ,...,/ cv ) 



2a 



so that each loss matrix M„. satisfies -< M- < -I. 
(e) Update each weight matrix according to the standard MMW update rule: 



exp ( - 7 ( ikff ) + ■ ■ ■ + M 



3. Return 



t=l 

as the ^-approximation to \{R). 
4. If optimal strategies are desired then compute 

T 

T 



1 T 

(pi,..., p a ) = — ,---,P, 



(^)^ 



and 



(p,n a ,...,n 1 ) = i^(pw,nw,...,nf)), 



t=i 



both of which are 5-optimal for fi £ (R). 

Compute (p[ , . . . , p' a ) from (pi , . . . , p a ) as described in item[3]of Theorem[5] 
Return p^, and P, both of which are 35/2-optimal for X(R). 



Figure 4: An efficient parallel oracle-algorithm for approximating A(P) (ProblemfT]). 
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can increase by no more than 5/2 when (P, ITi, . . . , Il a ) is substituted for {P®, Ha , ■ ■ ■ , 11]* )■ It then 
follows from the above expression for A that 

(?®>- • ■ >P$) • A* ( P ' n «' • • • ' ni )^> < ^ + 5/2 < /i £ (i?) + 5 

and hence (pi, . . . , p a ) is ^-optimal for fi e (R) as desired. 

Next we argue that the operator P returned in step|4]is 35/2-optimal for \(R). By item|2]of Theorem 
[5]it suffices to argue that (P, II a , . . . , 111) are ^-optimal for p £ (R). To this end, choose any (pi, . . . , p a ). It 
follows from the above expression for A that 

((pi,..., Pa ),fR,e (P,n a ,...,n 1 })>\-6/2>n e (R)-s 

and hence (P, fl a , . . . , fl{) is 5-optimal for p E (R) as desired. 

The efficiency of this algorithm is not difficult to argue. Each individual step consists only of matrix 
operations that are known to admit an efficient parallel implementation. Efficiency then follows from the 
observation that the number T of iterations is polynomial in a + b, 1/5, and log(dim(CV)). □ 

6.3 Special case: semidefinite programs on consistent density operators, a direct simulation 
of QIP 

Consider a special case of the problem of approximating X(R) (Problem [T]) in which 6 = 0. Since there is 
no interaction with Bob, this scenario corresponds to an ordinary, single-prover quantum interactive proof. 
In this case, P(R) = {V*UV a } is a singleton set and the expression (0 for X(R) simplifies to 

X(R)= min ( Pa ,V a *UV a ), 

[PX,—,Pa) 
consistent with R 

which is a semidefinite program whose feasible region consists of density operators p\ , . . . , p a consistent 
with R. The SDP dH) from the beginning of this paper is recovered by substituting the explicit conditions 
for consistency with referee R listed in Proposition |4] 

Since P(-R) is a singleton set, the oracle is trivial to implement so it follows immediately from Proposi- 
tion|9]that the algorithm presented in Figure H]can be used to solve SDPs of this form efficiently in parallel 
and thus prove QIP = PSPACE via direct simulation of a multi-message quantum interactive proof. 

Later we will see that the oracle for weak optimization for P(-R) (Problem [2]) required for general 
instances of X(R) can be reduced to an instance of this SDP special case of Problem Q] plus some post- 
processing. 

7 Implementation of the oracle 

In Section [6] we presented a parallel oracle-algorithm (Figure |4| for the problem of approximating X(R) 
(Problem [U and proved its correctness and efficiency (Proposition [9]>. In order to complete the description 
of our algorithm for double quantum interactive proofs it remains only to describe the implementation of 
the oracle for weak optimization for P(R) (Problem O. In this section we establish the following. 
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Proposition 10. The weak optimization problem (Problem^ for the set P(-R) specified in Eq. ((6]) admits a 
parallel algorithm with run time bounded by a polynomial in a + b, 1/5, and log(dim(CV)). 

It follows that the algorithm of Figure ^\is an unconditionally efficient parallel algorithm for approxi- 
mating \(R) (Problem\Tj). 

As mentioned earlier, this instance of Problem [2] will be rephrased as a new instance of Problem [T] (plus 
some post-processing) so that the algorithm of Section [6] can be reused in the implementation of our oracle. 
Incidentally, we shall see that this new instance of Problem Q] has the special SDP form described in Section 

Choose any state p and suppose that a (possibly cheating) Alice was somehow able to make it so that the 
state of the registers (C, V) after the interaction with Alice is in state p. Let A be a register large enough to 
admit a purification of p and let | tp) G AC V be any such purification. If Bob acts according to (B\ , . . . , B b ) 
then (similar to Eq. Q) his probability of victory is 

Pr[Bob wins | p, (B 1 ,...,B b )} = \\UV a+ bB b V a+ b-iB b -i ■ ■ ■ BiV a \<p) || 2 • 

Notice that this quantity also represents the probability of victory in a different, one-player game with a 
referee R' whose initial state is V a \(p). (Formally, the referee R' exchanges b rounds of messages with one 
of the players and zero messages with the other.) The unitaries B\,...,B b could specify actions for either 
Alice or Bob — a choice that depends only upon how we label the components of the referee R'. 

Since our goal is to reduce Problem[2]to an instance of the SDP special case of Problem[H it befits us to 
view B\, . . . , B b as actions for Alice in the game with referee R' . Let us write 

r! = {v a \v),vl---X,V) 

where V( = V a+ i ® I4 for each i = 1, . . . , b and II' = (/ — II) £5 The private memory register V of 
the new referee R' is identified with the registers (V, A) and the message register C of the new referee is 
identified with C. In this case, the set P(-R') = {Q} is a singleton set with Q = V b *H'V b . Each choice of 
unitaries B\, . . . , B b induces both a measurement operator P € P(R) and a state £ G with 

(p, P) = \\nV a + b B b V a + b - 1 B b _ 1 •• • BxValv) || 2 = 1 - (£, Q) 

and therefore 

max (p, P) = 1 - X(R') = 1 - min (£, Q) . 
PeP(R) CeA(K') 

Moreover, if P G P(-R) achieves the maximum on the left side then the unitaries B\,...,B b that induce P 
also induce a state £ G A(i?') that achieves the minimum on the right side. As the right side is an instance 
of the SDP special case of Problem[T]a solution to Problem [2] presents itself: 

1. Use the algorithm of Figure@]to find £ G A(i?') minimizing (£, Q). 

2. Find the unitaries B\,...,B b that induce £. These unitaries also induce a measurement operator 
P G P(-R) maximizing (p, P). Compute P using B\,...,B b via standard matrix multiplication. 

We already saw in Section [6] how the algorithm of Figure [4] can be used to accomplish step Q] In the 
remainder of this section we fill in the details for step |2] Recall that the algorithm of Figure @] finds near- 
optimal density operators £i, . . . , £& consistent with the referee R' = (V a \(p), V{, . . . , V b \ IT'), meaning that 
Trc(£i+i) = Trc(V^iV-*) for each i = 0, ... ,6-1 where Vq =V a ®Ij( and£ = \<p)(<p\ for convenience. 
The following algorithm finds the unitaries B\, . . . , B b . 
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1. Let B be a space large enough to admit purifications of £1, . . . , and write |ao) = \<£>) 0^)- 

2. For each i = 1, . . . , b: 

(a) Compute a purification |aj) € ACVB of £j. 

(b) Compute a unitary Bi € U(C£>) that maps V^-Ja^-i) to |aj). 

3. Return the desired unitaries (£?i, . . . , B^). 

It is straightforward to verify the correctness of the above algorithm for step 2, and hence the whole 
algorithm when one compose the two steps together. It remains to show the above algorithm admits an 
efficient parallel implementation. Again such efficiency comes from the efficiency of each step and the 
number of steps is polynomial in a + b. The only non-standard matrix operations involved are computing 
purifications and computing a unitary that maps one purification to another. These operations as well as 
possible precision issues are handled explicitly in the previous version of this paper MGWIOI Lemma 3-4, 
Appendix B]. 



In this section we explain how our parallel algorithm for double quantum interactive proofs implies the 
complexity theoretic equality DQIP = DIP = PSPACE. Formally, a decision problem L is said to 
admit a double quantum interactive proof with completeness c(\x\) and soundness s(\x\) if there exists a 
polynomial-time uniform quantum referee R x such that 



The complexity class DQIP consists of all decision problems L that admit double quantum interactive 
proofs with completeness c(\x\) and soundness s(\x\) for which there exists a polynomial-bounded function 
such that c — s > 1/p. The class DIP is defined similarly except that the referee is classical. By 
definition it holds that DIP C DQIP. 

Proposition 11. DQIP C PSPACE. 

Proof. Let L be any decision problem in DQIP and let R x denote the referee witnessing this fact. Let 
x be any input string and consider the following algorithm for deciding whether x is a yes-instance or a 
no-instance of L. 

1. Compute an explicit description of the referee R x = (\ip), V\, . . . , V a +b, n). 

2. Compute a ^-approximation of the quantity X(R X ) by running an efficient parallel implementation of 
the algorithm of Figure |4]for the choice 5 = (c — s)/3 and accept or reject accordingly. 

6 The roles of no-instances and yes-instances in this definition are the opposite of convention, an artifact of our decision to define 
the quantity X(R) in terms of the probability of victory for Bob, as opposed to Alice. This cosmetic choice was made so as to better 
facilitate the use of the MMW. 



8 Containment of DQIP inside PSPACE 
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The first step requires only simple matrix multiplication and can therefore be implemented by standard 
parallel algorithms with run time bounded by a polynomial in log(dim(CV)). Proposition [TOl establishes 
the same for the second step given the promise c — s > 1/poly. The entire process can then be simulated 
in polynomial space by standard methods (NC(poly) = PSPACE [Bor77]), from which it follows that 
L € PSPACE and hence DQIP C PSPACE. □ 

The characterization 

DQIP = DIP = PSPACE 

now follows immediately from the well known fact that IP = PSPACE and from the trivial containment 
IP C DIP. 

Often in the study of interactive proofs the precise values of the completeness and soundness parameters 
c, s are immaterial because sequential repetition (or sometimes parallel repetition) can be used to transform 
any interactive proof with c — s > 1/poly into another interactive proof in which c tends toward one and 
s tends toward zero exponentially quickly in the bit length of the input string x. For this reason, it is 
typical to assume without loss of generality that c, s are constants such as 2/3 and 1/3 or that 1 — c and 
s are exponentially small whenever it is convenient to do so. However, it is not immediately clear from 
their definition that double quantum interactive proofs are robust with respect to the choice of c, s so we 
must be as inclusive as possible when defining the classes DIP and DQIP. Fortunately, our algorithm for 
double quantum interactive proofs does not require any extra promise on c, s beyond the standard condition 
c — s > 1/poly. 

A fortunate corollary of the collapse of DQIP and DIP to PSPACE is that these classes are fully robust 
with respect to the choice of c, s. That is, if a decision problem L admits a double quantum interactive proof 
c — s > 1/p then L also admits a double quantum interactive proof with c = 1 and s < 2~ q for any desired 
polynomial-bounded function q(x). However, the method by which the original game is transformed into 
the low-error game is very circuitous: the original game must be solved in polynomial space according 
to Proposition \TT\ and then that polynomial-space computation must be converted back into an interactive 
proof with perfect completeness and exponentially small soundness according to proofs of IP = PSPACE. 
It would be nice to know whether a more straightforward transformation such as parallel repetition followed 
by a majority vote could be used to reduce error for double quantum interactive proofs and other bounded- 
turn quantum games. 

9 Extensions 

9.1 Finding near-optimal strategies 

Thus far we have concerned ourselves primarily with the problem of approximating the value X(R) of a 
double quantum interactive proof. But it is not difficult to extend our result so as to solve the related search 
problem of finding near-optimal strategies for the players. Indeed, step [4] of the algorithm of Figure [4] 
returns a transcript (p' v . . . , p' a ) and a measurement operator P € P(-R), both of which are 35/2-optimal 
for the formulation of \{R) given in Eq. ([7]). The unitaries A\, . . . , A a for Alice can be recovered from the 
transcript (p[, . . . , p' a ) via the method described in Section |7J with no additinal complication. 

It is only slightly more difficult to recover Bob's unitaries B\ , . . . , Bf, from P. Our definition of Problem 
[2](Weak optimization for P(-R)) specifies only that a solution produce a near-optimal measurement operator 
pit) f or 

a given state p$ . But the solution to Problem [2] described in Section [7] produces p(*) by first 
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constructing the associated unitaries B± , . . . , Bi . It is a simple matter to modify our definition of Problem 
|2]so as to also return those unitaries in addition to the desired measurement operator 

The near-optimal measurement operator P returned in step @] of the algorithm of Figure 0] is given by 

which indicates a strategy for Bob that selects t G {1, . . . , T} uniformly at random and then acts according 
to Bip , . . . , Bfp . It is a simple matter to construct unitaries Bi,...,Bf, that implement this probabilistic 
strategy by sampling the integer t during the first round, recording that integer in Bob's private memory 
(which must be enlarged slightly to make room for it), and controlling the operation in subsequent turns 
on the contents of that integer. All of the matrix operations required to construct B\ , . . . , Bj, from each 
B?\...,B$ in this way can be implemented efficiently in parallel. 

9.2 Arbitrary payoff observables 

In this paper we restricted attention to win-lose zero-sum games wherein the referee's measurement {IT, / — 
IT} at the end of the game indicates only a winner without specifying payouts to the players. In general, 
the referee's final measurement {II a } ag s could have outcomes belonging to some arbitrary finite set S. In 
this case, the referee awards payouts to the players according to a payout function v : £ — > M where v(a) 
denotes the payout to Alice in the event of outcome a. (Since the game is zero-sum, Bob's payout must be 
— v (a).) Jain and Watrous describe a simple transformation by which their algorithm for one-turn games can 
be used to compute the expected payout in this more general setting [JW09]. Their transformation extends 
without complication to our games. 

In our case, the expected payout to Alice when she and Bob play according to {A\, . . . ,A a ) and 
(Bi, . . . , Bb), respectively, is given by 

5>(a)<0|na|0) = (0|n E |$ 

where 

\<j>) = Va+bBbVa+^xB^ ■ ■ ■ B 1 V a A a V a . 1 A a _ 1 ■ ■ ■ A 2 V 1 A 1 \^) 

is the final state of the game and the Hermitian operator lis = Saes v ( a )^a denotes the payout observable 
induced by the referee. The expected payout of this game can be computed simply by translating and 
rescaling lis so as to obtain a measurement operator ^ IT X I and then running our algorithm for double 
quantum interactive proofs with referee R = (\ip), V±, . . . , V a+ b, lis). The expected payout of the original 
game is then obtained by inverting the scaling and translation operations by which IT was obtained from 
lis. As noted by Jain and Watrous, this transformation has the effect of inflating the additive approximation 
error 5 by a factor of || lis \\, which is the maximum absolute value of any given payout. 
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